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Abstract. Given a bipartite graph G = (V\,V2, E) where edges take 
on both positive and negative weights from set S, the maximum weighted 
edge btclique problem, or 5-MWEB for short, asks to find a bipartite sub- 
graph whose sum of edge weights is maximized. This problem has various 
applications in bioinformatics, machine learning and databases and its 
(in)approximability remains open. In this paper, we show that for a wide 
range of choices of 5, specifically when |^^| £ ^?(^7*"^/^) n ©(j)^/^"*) 
(where 77 = max{|yi|, IV2I}, and 5 £ (0, 1/2]), no polynomial time algo- 
rithm can approximate 5-MWEB within a factor of n'^ for some e > 
unless RP = NP. This hardness result gives justification of the heuristic 
approaches adopted for various applied problems in the aforementioned 
areas, and indicates that good approximation algorithms are unlikely to 
exist. Specifically, we give two applications by showing that: 1) finding 
statistically significant biclusters in the SAMBA model, proposed in [TH] 
for the analysis of microarray data, is n^-inapproximable; and 2) no poly- 
nomial time algorithm exists for the Minimum Description Length with 
Holes problem [3] unless RP = NP. 

1 Introduction 

Let G = (Vi, V2, E) be an undirected bipartite graph. A biclique subgraph in G 
is a complete bipartite subgraph of G and maximum edge biclique (MEB) is the 
problem of finding a biclique subgraph with the most number of edges. MEB is 
a well-known problem and received much attention in recent years because of 
its wide range of applications in areas including machine learning [T3], manage- 
ment science |16j and bioinformatics, where it is found particularly relevant in 
the formulation of numerous biclustering problems for biological data analysis 
|5l2ll8ll9ll7j . and we refer readers to the survey by Madeira and Oliveira [13] 
for a fairly extensive discussion on this. Maximum edge biclique is shown to be 
NP-hard by Peeters [TS] via a reduction from 3SAT. Its approximability status, 
on the other hand, remains an open question despite considerable efforts |7I8I12) 
0. In particular, Feige and Kogan [51 conjectured that maximum edge biclique 

^ Note it might be easy to confuse the MEB problem with the Bipartite Clique problem 
discussed by Khot in [12] . Bipartite Clique, which also known as Balanced Complete 
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is hard to approximate within a factor of n*^ for some e > 0. In this paper, we 
consider a weighted formulation of this problem defined as follows 

Definition 1. 5-Maximum Weighted Edge Biclique (5-MWEB) 
Instance: A complete bipartite graph G — (yi,V2, E) (throughout the paper, let 
rj ~ max{\Vi\, |1^2|} OLnd n = |Vi| + \V2\), a weight function wg : E ~* S, where 
S is a set consisting of both positive and negative integers. 

Question: Find a bicligue subgraph of G where the sum of weights on edges is 
maximized. 

A few comments are in order. First note it is not a lose of generality but a 
technical convenience to require the graph be complete, one can always think of 
an incomplete bipartite graph as complete where non-edges are assigned weight 
0. Also note we require that both positive and negative weights be in S at the 
same time because otherwise 5-MWEB becomes a trivial problem. 
Our study of iS-MWEB is motivated by the problem of finding statistically 
significant biclusters in microarray data analysis in the SAMBA model [18] 
and the Minimum Description Length with Holes (MDLH) problem |3I4I10| : 
detailed discussion of the two problems can be found in Sect. 4. Our main 
technical contribution of this paper is to show that if S satisfies the condition 
I e i7(77''-i/2) n 0{t]^/^~^), where (5 > is any arbitrarily small constant, 
then no polynomial time algorithm can approximate 5-MWEB within a factor 
of n*^ for some e > unless RP = NP. This result enables us to answer open 
questions regarding the hardness of the SAMBA model and the MDLH prob- 
lem. Since maximum edge biclique can be characterized as a special case of 
iS-MWEB with S — {—rj, 1}, the n'^-inapproximability result also provides inter- 
esting insights into the conjectured n'^-inapproximability p] of maximum edge 
biclique. 

The rest of the paper is organized in three sections. In Sect. 2, we present 
the main technical result by proving the aforementioned inapproximability of 5- 
MWEB. We give applications of this by answering hardness questions regarding 
two applied problems in Sect. 3. We conclude this work by raising a few open 
problems in the last section. 

2 Approximating «S-Maximum Edge Biclique is Hard 

We start this section by giving two lemmas about CLIQUE, which will be used 
in establishing inapproximability for the biclique problems we consider later. 
Lemma [T] is a recent result by Zuckerman j20) , obtained by a derandomization 
of results of Hastad [TT] ; Lemma [2] follows immediately from Lemma [1] 

Lemma 1. ( [20] ) It is NP-hard to approximate CLIQUE within a factor of 
n^~'^, for any e > 0. 

Bipartite Subgraph [8], aims to maximize the number of vertices of a balanced sub- 
graph whereas MEB aims to maximize the total weights on edges in a (not necessarily 
balanced) subgraph. 
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Lemma 2. For any constant e > 0, no polynomial time algorithm can approx- 
imate CLIQUE within a factor of n}~'^ with probability at least poi^^n) ^^i^^** 
RP = NP. 

2.1 A Technical Lemma 

We first describe the construction of a structure called {7, {a, /3}}-Product, 
which will be used in the proof of our main technical lemma. 

Definition 2. ({7, {a, /3}}-Product) 

Input: An instance of S-MWEB on complete bipartite graph G = ViX V2, where 
7 e 5 and a < j < (3; an integer N. 

Output: Complete bipartite graph = V-^ x V.^ constructed as follows: V-^ 
and are N duplicates ofV\ and V2, respectively. For each edge € G^ , 
let {4>{i) , be the corresponding edge in G. If WG{(t>{i),4>{j)) = 7; assign 

weight a or (3 to (i, j) independently and identically at random with expectation 
being 7, denote the weight by random variable X. If WG{(l>{i),(j){j)) ^ 7, then 
keep the weight unchanged. Call the weight function constructed this way w{-). 

For any subgraph H of G^ , denote by w^{II) (resp., W-^{II)) the total 
weight of H contributed by former-^ -edges (resp., other edges). Clearly, w{H) = 
'Wj{H) + w-j{H). 

With a graph product constructed in this randomized fashion, we have the fol- 
lowing lemma. 

Lemma 3. Given an S-MWEB instance G = {Vi,V2,E) where j G S, and a 

numbers e (0,^]; H 7? = max (|Vi|, 1^21), iV = r/^WW^, G^ = {Vf,V2^,E) 
be the {7, {a, l3}}-product of G and S' = (S U {a, /?}) - {7}. If 

1. 1/3 -a| =0((JV77)5-'5); and 

2. there is a polynomial time algorithm that approximates the S'-MWEB 
instance within a factor of \, where A is some arbitrary function in the size of 
the S'-MWEB instance 

then there exists a polynomial time algorithm that approximates the S-MWEB 
instance within a factor of X, with probability at least poiy(^n) ■ 

Proof. For notational convenience, we denote 772 ""^ by /(?/) throughout the proof. 
Define random variable F = X — 7, clearly E\Y] =0. Suppose there is a poly- 
nomial time algorithm A that approximates »S'-MWEB within a factor of A, we 
can then run A on G^, the output biclique G*g corresponds to A'^^ bicliques in 
G (not necessarily all distinct). Let G\ be the most weighted among these iV^ 
subgraphs of G, in the rest of the proof we show that with high probability, G\ 
is a A-approximation of 5-MWEB on G. 

Denote by Ei the event that G|j does not imply a A-approximation on G. 
Let H be the set of subgraphs of G^ that do not imply a A-approximation on G, 
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clearly, \'H\ < 4^''. Let H' be an arbitrary element in we have the following 
inequalities 

Pr {El} < Frjat least one element in W is a A-approximation of G^} 
< 4"'' ■Pr{H' isB. A-approximation of G^} 

= 4^'' • Pr{&2} 

where E2 is the event that H' is a A-approximation of . 

Let the weight of an optimal solution C/i x C/2 of G be ii", denote by x 
the c;orrcsponding iV^-duplication in G^ . Let xi and X2 be the number of former- 
7-edges in H' and Ui x U2 , respectively. Suppose E2 happens, then we must 
have 

w^^{H')+xi-f<N^{f-l) 

where the first inequality follows from the fact that we only consider integer 
weights. Since W-^{U(^ x U2^) = N'^K — X2J, it implies 

K(ir') - X17) - jiw^iU^ X U^) - X2i) > 

so we have the following statement on probability 

Pr{E2} < Pr{{w^{H') - xx^i) - \{w^{U^ x f/f ) - X21) > N^] 

Let z\ (resp., Z2 and 2:3) be the number of edges in E{H') — E{U^ x U2) 
( rcsp., E{U^ X U^) - E{H') and E{U^ x U^) n E{H') ) transformed from 
former-7-edges in G. Wc have 

Pr {{w^m - x,j) - iK(C/f X - :e27) > N^} 

= Pr lE-ii Y^ - i e;Li yj + ^ Efeli n>N^} 

= Pr vEtu y^ + J e;Li i-y^) + ^ nu y^ > n^} 

< Pr |e£i y^>^}+ prhE;u i-y,) > ^} + Eti > ^} 

< Pr {Erii y^>^}+ Pr{L%^ i-y^) >^}+Pr {EIU ^ > ^} 

< Eie{i,2,3} (cxp (^-2zi ( 3^^,y(jv„)) ) (Hoeffding bound) 
<3-exp(-C2-^) {zi<v^N^) 

where ci,C2 are constants (c2 > 0). Now if we set N = 7^1+2?+^ for some 6, we 
have 

Pr {El} < 4^'' • Pr {E2} < 3 • exp (^ln4 • 7711^+^ - C2 • 7?^^+^*^^) 

For this probability to be bounded by 5 as ry is large enough, we need to have 
Yf2S'^^ ^ (1 + 25)^. Solving this inequality gives 6 > gf^^^2S) • Therefore, for any 

5(3-25)+3 

S e (0, 5], by setting AT = rj ^ have Pr{Ei}, i.e. the probability that 
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the solution returned by A does not imply a A-approxiniation of G, is bounded 
from above by ^ once input size is large enough. This gives a polynomial time 
algorithm that approximates 5-MWEB within a factor of A with probability at 
least ^. □ 

This lemma immediately leads to the following corollary. 

Corollary 1. Following the construction in Lemma\^ if S' -MWEB can be ap- 
proximated within a factor of rf , for some e' > 0, then there exists a polyno- 
mial time algorithm that approximates S-MWEB within a factor of n'^ , where 
^ (1 + ^^rl^r)^'' probability at least ^^j^. El 

Proof Let |G| and |G^| be the number of nodes in the 5-MWEB and 5'-MWEB 

problem, respectively. Since , our claim follows 

from Lemma |3l □ 



2.2 {-1,0,1}-MWEB 

In this section, we prove inapproximability of {— 1, 0, 1}-MWEB by giving a 
reduction from CLIQUE; in subsequence sections, we prove inapproximability 
results for more general 5-MWEB by constructing randomized reduction from 
{-1,0,1}-MWEB. 

Lemma 4. The decision version of the { — 1,0, 1}- MWEB problem is NP-complete. 

Proof. We prove this by describing a reduction from CLIQUE. Given a CLIQUE 
instance G = {V, E), construct G' = {V , E') such that V ^ V1UV2 where Vi, V2 
are duplicates of V in that there exist bijcctions (pi : Vi ^ V and (/)2 : V2 ^ V. 
And 

= El U E2 U 

El = {{u,v) \ ueVi,v eV2 and 02(u)) E E} 

E2 = {{u,v) \ueVi,ve V2,Mu) + 02(w) and (0i(w), (/)2(w)) i E} 

E-i = {{u,v) I uEVi,vE V2, and 4)i{u) = (/)2(w)} 

Clearly, G' is a biclique. Now assign weight to edges in E'l, — 1 to edges in 
E2 and 1 to edges in E^^. We then claim that there is a clique of size fc in G if 
and only if there is a biclique of total edge weight k in G'. 

First consider the case where there is a clique of size k in G, let U be the set 
of vertices of the clique, then taking the subgraph induced by (/)j~^(?7) x 4>2^{U) 
in G' gives us a biclique of total weight k. 

Now suppose that there is a biclique U1XU2 oi total weight k in G'. Without 
loss of generality, assume Ui and U2 correspond to the same subset of vertices in 

^ Note we are slightly abusing notation here by always representing the size of a given 
problem under discussion by n. Here n refers to the size of >S'-MWEB (resp. 5- 
MWEB) when we are talking about approximation factor n'^ (resp. n"^). We adopt 
the same convention in the sequel. 
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V because if — cj)2{U2)) U (02(^/2) — (t^iiUi)) is not empty, then removing 

{Ui — U2) U {U2 — Ui) will never decrease the total weight of the solution. Given 
0i(C/i) = 02(^^2), we argue that there is no edge of weight —1 in bichque Ui x C/2; 
suppose otherwise there exists a weight —1 edge ^2) (*i G Ui, and j2 & U2), 
then the corresponding edge (ji,i2) (ji G C^i, and «2 S C^2) must be of weight 
— 1 too and removing «i,Z2 from the solution biclique will increase total weight 
by at least 1 because among all edges incident to ii and 12, (ii, 12) is of weight 1, 
(*i7 ^2) and (12, ji) are of weight —1 and the rest are of weights either or —1. 

Therefore, we have shown that if there is a solution Ui x U2 oi weight k in 
G", Ui and U2 correspond to the same set of vertices U and C/ is a clique of 
size k. It is clear that the reduction can be performed in polynomial time and 
the problem is NP, and thus NP-complete. □ 

Given Lemma [l] the following corollary follows immediately from the above 
reduction. 

Theorem 1. For any constant e > Q, no polynomial time algorithm can approx- 
imate problem { — 1, 0, 1}-MWEB within a factor of n^~^ unless P = NP. 

Proof. It is obvious that the reduction given in the proof of Lemma [4] preserves 
inapproximability exactly, and given that CLIQUE is hard to approximate within 
a factor of unless P — NP, the theorem follows. □ 

Theorem 2. For any constant e > 0, no polynomial time algorithm can approx- 
imate { — 1tQ,1}-MWEB within a factor of n^^*^ with probability at least p^iy(^^-^ 
unless RP = NP. 

Proof. If there exists such a randomized algorithm for {—1,0, 1}-MWEB, com- 
bining it with the reduction given in Lemma |31 we obtain an RP algorithm for 
CLIQUE. This is impossible unless RP = NP. □ 



2.3 {-1,1}-MWEB 

Lemma 5. If there exists a polynomial time algorithm that approximates { — 1,1}- 

MWEB within a factor of , then there exists a polynomial time algorithm that 

approximates {— 1, 0, l}-MFFi?i? within a factor of n^"^ with probability at least 
1 

poly{n) ■ 

Proof. We prove this by constructing a {7, {a, /9}}-Product from { — 1,0,1}- 
MWEB to {-1, 1}-MWEB by setting 7 = 0, a = -1 and /? = 1. Since 5 = \, 
according to Corollary [l] it is sufficient to set N = rf so that the probability of 
obtaining a n^'^-approximation for {—1,0, 1}-MWEB is at least poiy{n) • '-' 

Theorem 3. For any constant e > 0, no polynomial time algorithm can approx- 
imate {—1,1}-MWEB within a factor of n^~^ with probability at least ^^^^^^^ 
unless RP = NP. 



Proof. This follows directly from Theorem [2] and Lemma [S] 



□ 
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2.4 {-T?*-^, 1}-MWEB and {-ry^-^ , 1}-MWEB 

In this section, we consider the generahzed cases of the 5-MWEB problem. 

Theorem 4. For any S G (0, there exists some constant e such that no poly- 
nomial time algorithm can approximate {—ri^^^,l}-MWEB within a factor of 
n'^ with probability at least p^iyf^^) '^'^^^ss RP = NP. The same statement holds 
for {-Tj^-i,l}-MWEB. 

Proof. We prove this by first construct a {7, {a, /3}}-Product from {—1,1}- 
MWEB to {-775-^ 1}-MWEB by setting 7 = -1, a = -{Nr])^-^ and /3 = 1. By 
Corollary [U we know that for any S G (0, if there exists a polynomial time al- 
gorithm that approximates {—rj^^^ , 1}-MWEB within a factor of n', then there 
exists a polynomial time algorithm that approximates {—1, 1}-MWEB within a 

factor of 5(1+2^) ' with probability at least ^^^^j^^ . So invoking the hardness 

result in Theorem [3] gives the desired hardness result for {—t]^^^, 1}-MWEB. 

The same conclusion applies to {—1, 772 ^*}-MWEB by setting 7 = 1, a = — 1 
and P = [Nr])^^^ . Since is a constant for any given graph, we can simply divide 
each weight in { — 1,7/2^*} by rp"''. □ 

Theorem m leads to the following general statement. 

Theorem 5. For any small constant S e (0,i], «/|7Si||| e n{Tj^-^/^)nO{ri^^^-^), 
then there exists some constant e such that no polynomial time algorithm can ap- 
proximate S-MWEB within a factor of rf with probability at least pgiy^^-^ unless 
RP = NP. 

3 Two Applications 

In this section, we describe two applications of the results establish in Sect. 3 by 
proving hardness and inapproximability of problems found in practice. 

3.1 SAMBA Model is Hard 

Microarray technology has been the latest technological breakthrough in biolog- 
ical and biomedical research; in many applications, a key step in analyzing gene 
expression data obtained through microarray is the identification of a bicluster 
satisfying certain properties and with largest area (see the survey [13] for a fairly 
extensive discussion on this). 

In particular, Tanay et. al. [15] considered the Statistical- Algorithmic Method 
for Bicluster Analysis (SAMBA) model. In their formulation, a complete bipar- 
tite graph is given where one side corresponds to genes and the other size cor- 
responds to conditions. An edges (u, v) is assigned a real weight which could be 
either positive or negative, depending on the expression level of gene u in condi- 
tion V, in a way such that heavy subgraphs corresponds to statistically significant 
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biclusters. Two weight-assigning schemes are considered in their paper. In the 
first, or simple statistical model, a tight upper-bound on the probability of an 
observed biclusters in computed; in the second, or refined statistical model, the 
weights are assigned in a way such that a maximum weight biclique subgraph 
corresponds to a maximum likelihood bicluster. 

The Simple SAMBA Statistical Model: Let H = {V{, V^, E') be a subgraph 
oi G ^ {Vi,V2,E),W = {V{ X K,'} - E' and p = yv^\- The simple statistical 
model assumes that edges occur independently and identically at random with 
probability p. Denote by BT{k,p,n) the probability of observing k or more 
successes in n binomial trials, the probability of observing a graph at least as 
dense as H is thus p{H) = BT{\E'\,p, \V{\\V2\)- This model assumes p < ^ and 
1^1 1 1^2 I < 1^1 1 1^2 1, therefore is upper bounded by 

The goal of this model is thus to find a subgraph H with the smallest p*{H). 
This is equivalent to maximizing 

-\ogp*{H) = \E'\{-1 - logp) + {\Vl\\Vi\ - \E'\){-1 - log(l -p)) 

which is essentially solving a 5-MWEB problem that assigns either positive 
weight (—1 — logp) or negative weight (—1 — log(l — p)) to an edge [u,v), de- 
pending on whether gene u express or not in condition w, respectively. The 
summation of edge weights over H is defined as the statistical significance of H. 

Since 4^ < p < 5, asymptotically we have ~^Zl°3io\~p^^ ^ n 0(1). 

Invoking Theorem [5] gives the following. 

Theorem 6. For the Simple SAMBA Statistical model, there exists some e > 
such that no polynomial time algorithm, possibly randomized, can find a bicluster 
whose statistical significance is within a factor of n'^ of optimal unless RP = NP. 

The Refined SAMBA Statistical Model: In the refined model, each edge 
{u,v) is assumed to take an independent Bernoulli trial with parameter Pu,v, 
therefore p{H) = {Y{(u^^)^e' P^,^)^Y{{u,v)&'W^^ - Pu,v)) is the probability of ob- 
serving a subgraph H. Since p{H) generally decreases as the size of H increases, 
Tanay et al. aims to find a bicluster with the largest (normalized) likelihood ra- 
^. iI\(u.v)eE' Pc)iU(u.v)ewi^ - Pc)) , 

tio L(H) =^ — , where Pc > max(„ „ is a 

constant probability and chosen with biologically sound assumptions. Note this 
is equivalent to maximizing the log-likelihood ratio 

log L{H)= log— + E logr^ 

{u,v)eE' {u,v)£E' 

With this formulation, each edge is assigned weight either log > or 
log i^LTp" < s-iid finding the most statistically significant bicluster is equiva- 
lent to solving iS-MWEB with S = {log ^J^" , log j^}- Since Pc is a constant 
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and ^ < < p„ we have e n 0(1). Invoking 

Theorem [S] gives the following. 

Theorem 7. For the Refined SAMBA Statistical model, there exists some e > 
such that no polynomial time algorithm, possibly randomized, can find a bicluster 
whose log-likelihood is within a factor of n"^ of optimal unless RP ~ NP. 

3.2 Minimum Description Length with Holes (MDLH) is Hard 

Bu et. al considered the Minimum Description Length with Holes problem 
(defined in the following); the 2-dimensional case is claimed NP-hard in this 
paper and the proof is referred to [3]. However, the proof given in [3] suffers 
from an error in its reductiorH, thus whether MDLH is NP-complete remains 
unsettled. In this section, by employing the results established in the previous 
sections, we show that no polynomial time algorithm exists for MDLH, under 
the slightly weaker (than P ^ NP) but widely believed assumption RP ^ NP. 

We first briefly describe the Minimum Description Length summarization 
with Holes problem; for a detailed discussion of the subject, we refer the readers 
to [8l4] . 

Suppose one is given a /c-dimensional binary matrix M , where each entry is 
of value either 1, which is of interest, or of value 0, which is not of interest. Be- 
sides, there are also k hierarchies (trees) associated with each dimension, namely 
Ti, T2, Tfc, each of height li,l2, ■■■,lk respectively. Define level I = maxi{li). 
For each Ti, there is a bijection between its leafs and the 'hyperplanes' in the 
ith dimension (e.g. in a 2-dimensional matrix, these hyperplanes corresponds to 
rows and columns). A region is a tuple {xi,X2, ...,a;fc), where X-i IS 01 leaf node 
or an internal node in hierarchy Ti. Region (xi,X2, ■■■,Xk) is said to cover cell 
(ci, C2, Cfc) if Ci is a descendant of Xi, for all I < i < k. A k-dimensional l-level 
MDLH summary is defined as two sets S and H, where 1) 5* is a set of regions 
covering all the 1-entries in M; and 2) H is the set of 0-entries covered (unde- 
sirably) by S and to be excluded from the summary. The length of a summary 
is defined as jS*] -f and the MDLH problem asks the question if there exists 
a MDLH summary of length at most K, for a given K > 0. 

In an effort to establish hardness of MDLH, we first define the following 
problem, which serves as an intermediate problem bridging {—1, 1}-MWEB and 
MDLH. 

Definition 3. (Problem V) 

Instance: A complete bipartite graph G = {Vi,V2, E) where each edge takes on 
a value in { — 1, 1}, and a positive integer k. 

Question: Does there exist an induced subgraph (a biclique Ui x U2) whose 
total weight of edges is uj, such that \Ui\ + IC/2I > fc. 

Lemma 6. No polynomial time algorithm exists for Problem V unless RP = NP. 
^ In Lemma 3.2.1 of [3], the reduction from CLIQUE to CEW is incorrect. 
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Proof. We prove this by constructing a reduction from { — 1, 1}-MWEB to Prob- 
lem V as follows: for the given input biclique G = {Vi, V2, E), make N duplicates 
of Vi and N duplicates of V2, where N = {\Vi \ + |V2|)^. Connect each copy of 
Vi to each copy of V2 in a way that is identical to the input biclique, we then 
claim that there is a size k solution to {—1, 1}-MWEB if and only if there is a 
size N'^k solution to Problem V. 

If there is a size k solution to {—1, 1}-MWEB, then it is straightforward that 
there is a solution to Problem V of size at least N'^k. For the reverse direction, we 
show that if no solution to {—1, 1}-MWEB is of size at least k, then the maximum 
solution to Problem V is strictly less than N'^k. Note a solution x to 
Problem V consists of at most N'^ (not necessarily all distinct) solutions to 
{— 1, 1}-MWEB, and each of them can contribute at most (fc — 1) in weight to 
X f/^, so the total weight gained from edges is at most N'^{k — 1). And note 
the total weight gained from vertices is at most A^(|Vi| + IV2I) — N^/N, therefore 
the weight is upper bounded by N\/N + N'^ik — 1) < N'^k and this completes 
the proof. 

As a conclusion, we have a polynomial time reduction from { — 1, 1}-MWEB 
to Problem V. Since no polynomial time algorithm exists for { — 1, 1}-MWEB 
unless RP = NP, the same holds for Problem V. □ 

Theorem 8. No polynomial time algorithm exists for MDLH summarization, 
even in the 2-dimension 2-level case, unless RP = NP. 

Proof. We prove this by showing that Problem is a complementary problem 
of 2-dimensional 2-level MDLH. 

Let the input 2D matrix M be of size ni x 71-2, with a tree of height 2 associated 
with each dimension. Without loss of generality, we only consider the 'sparse' 
case where the number of 1-entries is less than the number of 0-entries by at 
least 2 so that the optimal solution will never contain the whole matrix as one 
of its regions. Let S be the set of regions in a solution. Let R and C be the set 
of rows and columns not included in S. Let Z be the set of all zero entries in M. 
Let z be the total number of zero entries in the R x C 'leftover' matrix and let 
w be the total number of 1-entries in it. MDLH tries to minimize the following: 

(m - \R\) + {712 - \C\) + {\Z\ - z) + w^{ni+n2 + \Z\) - {\R\ + \C\ + z ~ w) 

Since (ni + n2 + \Z\) is a fixed quantity for any given input matrix, the 2- 
dimensional 2-level MDLH problem is equivalent to maximizing (|i?| + |C|+z— w), 
which is precisely the definition of Problem V. 

Therefore, 2-dimensional 2-level MDLH is a complementary problem to Prob- 
lem V and by Lemma [5] we conclude that no polynomial time algorithm exists 
for 2-dimensional 2-level MDLH unless RP = NP. □ 

4 Concluding Remarks 

Maximum weighted edge biclique and its variants have received much atten- 
tion in recently years because of it wide range of applications in various fields 
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including machine learning, database, and particularly bioinformatics and com- 
putational biology, where many computational problems for the analysis of mi- 
croarray data are closely related. To tackle these applied problems, various kinds 
of heuristics are proposed and experimented and it is not known whether these 
algorithms give provable approximations. In this work, we answer this question 
by showing that it is highly unlikely (under the assumption RP 7^ NP) that good 
polynomial time approximation algorithm exists for maximum weighted edge 
biclique for a wide range of choices of weight; and we further give specific appli- 
cations of this result to two applied problems. We conclude our work by listing 
a few open questions. 

1. We have shown that {6'(— r/"^), 1}-MWEB is n'^-inapproximable for 6 G 
(— i, i); also it is easy to see that (i) the problem is in P when S < —1, where 
the entire input graph is the optimal solution; (ii) for any (5 > 1, the problem is 
equivalent to MEB, which is conjectured to be n^-inapproximable [8]. Therefore 
it is natural to ask what is the approximability of the {—n^, 1}-MWEB problem 
when S € (— 1, — ^] and (5 G 1]. In particular, can this be answered by a better 
analysis of Lemma [3]? 

2. We are especially interested in {—1, 1}-MWEB, which is closely related 
to the formulations of many natural problems |ll3l4ll8j . We have shown that 
no polynomial time algorithm exists for this problem unless RP = NP, and we 
believe this problem is NP-complete, however a proof has eluded us so far. 
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